Separating sets of strings by finding matching patterns is almost always hard

نویسندگان

  • Giuseppe Lancia
  • Luke Mathieson
  • Pablo Moscato
چکیده

We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding the Smallest Turing Machine Using k log(n) Non-deterministic Guesses

Consider that we are given a number m and two disjoint finite sets of strings A and R. Does there exist a DFA with at most m states that accepts the strings in A and rejects the string in R? We refer to this problem as the inference problem for DFA’s and denote it by INFDFA. It was shown by E. Mark Gold in [4] that INFDFA is NP-hard. To the best of my knowledge, it is not known whether INFDFA r...

متن کامل

Discovering Best Variable-Length-Don't-Care Patterns

A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{ })∗, where Σ is an alphabet and is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by ...

متن کامل

Optimizing image steganography by combining the GA and ICA

In this study, a novel approach which uses combination of steganography and cryptography for hiding information into digital images as host media is proposed. In the process, secret data is first encrypted using the mono-alphabetic substitution cipher method and then the encrypted secret data is embedded inside an image using an algorithm which combines the random patterns based on Space Fillin...

متن کامل

A New Family of String Classifiers Based on Local Relatedness

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...

متن کامل

On the Varshamov-Tenengolts construction on binary strings

This paper is motivated by the problem of finding the largest single-deletion-correcting code for binary strings. The Varshamov–Tenengolts construction classifies binary strings into non-overlapping sets, the largest set of these is asymptotically the largest singledeletion-correcting code. However despite the asymptotic optimality little is known about the quality of the construction as a func...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 665  شماره 

صفحات  -

تاریخ انتشار 2017